https://docs.xarray.dev/en/stable/_static/dataset-diagram-logo.png

Xarray in 45 minutes#

In this lesson, we discuss cover the basics of Xarray data structures. By the end of the lesson, we will be able to:

  • Understand the basic data structures in Xarray

  • Inspect DataArray and Dataset objects.

  • Read and write netCDF files using Xarray.

  • Understand that there are many packages that build on top of xarray

We’ll start by reviewing the various components of the Xarray data model, represented here visually:

https://docs.xarray.dev/en/stable/_images/dataset-diagram.png
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

%matplotlib inline
%config InlineBackend.figure_format='retina'

Xarray has a few small real-world tutorial datasets hosted in this GitHub repository https://github.com/pydata/xarray-data

xarray.tutorial.load_dataset is a convenience function to download and open DataSets by name. Here we’ll use air temperature from National Centers for Environmental Prediction. Xarray objects have convenient HTML representations to give an overview of what we’re working with:

ds = xr.tutorial.load_dataset("air_temperature")
ds
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

Note that behind the scenes the xarray.open_dataset function is opening this tutorial data with the “netCDF engine” because the data is stored in that format. A few things are done automatically upon opening, but controlled by keyword arguments. For example, try passing the keyword argument mask_and_scale=False… what happens?

What’s in a Dataset?#

Many DataArrays!

Datasets are dictionay-like containers of DataArrays. They are a mapping of variable name to DataArray:

# pull out "air" dataarray with dictionary syntax
ds["air"]
<xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)>
array([[[241.2    , 242.5    , 243.5    , ..., 232.79999, 235.5    ,
         238.59999],
        [243.79999, 244.5    , 244.7    , ..., 232.79999, 235.29999,
         239.29999],
        [250.     , 249.79999, 248.89   , ..., 233.2    , 236.39   ,
         241.7    ],
        ...,
        [296.6    , 296.19998, 296.4    , ..., 295.4    , 295.1    ,
         294.69998],
        [295.9    , 296.19998, 296.79   , ..., 295.9    , 295.9    ,
         295.19998],
        [296.29   , 296.79   , 297.1    , ..., 296.9    , 296.79   ,
         296.6    ]],

       [[242.09999, 242.7    , 243.09999, ..., 232.     , 233.59999,
         235.79999],
        [243.59999, 244.09999, 244.2    , ..., 231.     , 232.5    ,
         235.7    ],
        [253.2    , 252.89   , 252.09999, ..., 230.79999, 233.39   ,
         238.5    ],
...
        [293.69   , 293.88998, 295.38998, ..., 295.09   , 294.69   ,
         294.29   ],
        [296.29   , 297.19   , 297.59   , ..., 295.29   , 295.09   ,
         294.38998],
        [297.79   , 298.38998, 298.49   , ..., 295.69   , 295.49   ,
         295.19   ]],

       [[245.09   , 244.29   , 243.29   , ..., 241.68999, 241.48999,
         241.79   ],
        [249.89   , 249.29   , 248.39   , ..., 239.59   , 240.29   ,
         241.68999],
        [262.99   , 262.19   , 261.38998, ..., 239.89   , 242.59   ,
         246.29   ],
        ...,
        [293.79   , 293.69   , 295.09   , ..., 295.29   , 295.09   ,
         294.69   ],
        [296.09   , 296.88998, 297.19   , ..., 295.69   , 295.69   ,
         295.19   ],
        [297.69   , 298.09   , 298.09   , ..., 296.49   , 296.19   ,
         295.69   ]]], dtype=float32)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Attributes:
    long_name:     4xDaily Air temperature at sigma level 995
    units:         degK
    precision:     2
    GRIB_id:       11
    GRIB_name:     TMP
    var_desc:      Air temperature
    dataset:       NMC Reanalysis
    level_desc:    Surface
    statistic:     Individual Obs
    parent_stat:   Other
    actual_range:  [185.16 322.1 ]

You can save some typing by using the “attribute” or “dot” notation. This won’t work for variable names that clash with a built-in method name (like mean for example).

# pull out dataarray using dot notation
ds.air

What’s in a DataArray?#

data + (a lot of) metadata

Named dimensions#

.dims correspond to the axes of your data.

In this case we have 2 spatial dimensions (latitude and longitude are store with shorthand names lat and lon) and one temporal dimension (time).

ds.air.dims
('time', 'lat', 'lon')

Coordinate variables#

.coords is a simple data container for coordinate variables.

Here we see the actual timestamps and spatial positions of our air temperature data:

ds.air.coords
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00

Coordinates objects support similar indexing notation

# extracting coordinate variables
ds.air.lon
<xarray.DataArray 'lon' (lon: 53)>
array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
       225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
       250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
       275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
       300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
       325. , 327.5, 330. ], dtype=float32)
Coordinates:
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X
# extracting coordinate variables from .coords
ds.coords["lon"]
<xarray.DataArray 'lon' (lon: 53)>
array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
       225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
       250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
       275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
       300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
       325. , 327.5, 330. ], dtype=float32)
Coordinates:
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X

It is useful to think of the values in these coordinate variables as axis “labels” such as “tick labels” in a figure. These are coordinate locations on a grid at which you have data.

Arbitrary attributes#

.attrs is a dictionary that can contain arbitrary Python objects (strings, lists, integers, dictionaries, etc.) Your only limitation is that some attributes may not be writeable to certain file formats.

ds.air.attrs
{'long_name': '4xDaily Air temperature at sigma level 995',
 'units': 'degK',
 'precision': 2,
 'GRIB_id': 11,
 'GRIB_name': 'TMP',
 'var_desc': 'Air temperature',
 'dataset': 'NMC Reanalysis',
 'level_desc': 'Surface',
 'statistic': 'Individual Obs',
 'parent_stat': 'Other',
 'actual_range': array([185.16, 322.1 ], dtype=float32)}
# assign your own attributes!
ds.air.attrs["who_is_awesome"] = "xarray"
ds.air.attrs
{'long_name': '4xDaily Air temperature at sigma level 995',
 'units': 'degK',
 'precision': 2,
 'GRIB_id': 11,
 'GRIB_name': 'TMP',
 'var_desc': 'Air temperature',
 'dataset': 'NMC Reanalysis',
 'level_desc': 'Surface',
 'statistic': 'Individual Obs',
 'parent_stat': 'Other',
 'actual_range': array([185.16, 322.1 ], dtype=float32),
 'who_is_awesome': 'xarray'}

Underlying data#

.data contains the numpy array storing air temperature values.

https://raw.githubusercontent.com/numpy/numpy/623bc1fae1d47df24e7f1e29321d0c0ba2771ce0/branding/logo/primary/numpylogo.svg

Xarray structures wrap underlying simpler data structures. This part of Xarray is quite extensible allowing for GPU arrays, sparse arrays, arrays with units etc. which we’ll look at later in this tutorial.

ds.air.data
array([[[241.2    , 242.5    , 243.5    , ..., 232.79999, 235.5    ,
         238.59999],
        [243.79999, 244.5    , 244.7    , ..., 232.79999, 235.29999,
         239.29999],
        [250.     , 249.79999, 248.89   , ..., 233.2    , 236.39   ,
         241.7    ],
        ...,
        [296.6    , 296.19998, 296.4    , ..., 295.4    , 295.1    ,
         294.69998],
        [295.9    , 296.19998, 296.79   , ..., 295.9    , 295.9    ,
         295.19998],
        [296.29   , 296.79   , 297.1    , ..., 296.9    , 296.79   ,
         296.6    ]],

       [[242.09999, 242.7    , 243.09999, ..., 232.     , 233.59999,
         235.79999],
        [243.59999, 244.09999, 244.2    , ..., 231.     , 232.5    ,
         235.7    ],
        [253.2    , 252.89   , 252.09999, ..., 230.79999, 233.39   ,
         238.5    ],
        ...,
        [296.4    , 295.9    , 296.19998, ..., 295.4    , 295.1    ,
         294.79   ],
        [296.19998, 296.69998, 296.79   , ..., 295.6    , 295.5    ,
         295.1    ],
        [296.29   , 297.19998, 297.4    , ..., 296.4    , 296.4    ,
         296.6    ]],

       [[242.29999, 242.2    , 242.29999, ..., 234.29999, 236.09999,
         238.7    ],
        [244.59999, 244.39   , 244.     , ..., 230.29999, 232.     ,
         235.7    ],
        [256.19998, 255.5    , 254.2    , ..., 231.2    , 233.2    ,
         238.2    ],
        ...,
        [295.6    , 295.4    , 295.4    , ..., 296.29   , 295.29   ,
         295.     ],
        [296.19998, 296.5    , 296.29   , ..., 296.4    , 296.     ,
         295.6    ],
        [296.4    , 296.29   , 296.4    , ..., 297.     , 297.     ,
         296.79   ]],

       ...,

       [[243.48999, 242.98999, 242.09   , ..., 244.18999, 244.48999,
         244.89   ],
        [249.09   , 248.98999, 248.59   , ..., 240.59   , 241.29   ,
         242.68999],
        [262.69   , 262.19   , 261.69   , ..., 239.39   , 241.68999,
         245.18999],
        ...,
        [294.79   , 295.29   , 297.49   , ..., 295.49   , 295.38998,
         294.69   ],
        [296.79   , 297.88998, 298.29   , ..., 295.49   , 295.49   ,
         294.79   ],
        [298.19   , 299.19   , 298.79   , ..., 296.09   , 295.79   ,
         295.79   ]],

       [[245.79   , 244.79   , 243.48999, ..., 243.29   , 243.98999,
         244.79   ],
        [249.89   , 249.29   , 248.48999, ..., 241.29   , 242.48999,
         244.29   ],
        [262.38998, 261.79   , 261.29   , ..., 240.48999, 243.09   ,
         246.89   ],
        ...,
        [293.69   , 293.88998, 295.38998, ..., 295.09   , 294.69   ,
         294.29   ],
        [296.29   , 297.19   , 297.59   , ..., 295.29   , 295.09   ,
         294.38998],
        [297.79   , 298.38998, 298.49   , ..., 295.69   , 295.49   ,
         295.19   ]],

       [[245.09   , 244.29   , 243.29   , ..., 241.68999, 241.48999,
         241.79   ],
        [249.89   , 249.29   , 248.39   , ..., 239.59   , 240.29   ,
         241.68999],
        [262.99   , 262.19   , 261.38998, ..., 239.89   , 242.59   ,
         246.29   ],
        ...,
        [293.79   , 293.69   , 295.09   , ..., 295.29   , 295.09   ,
         294.69   ],
        [296.09   , 296.88998, 297.19   , ..., 295.69   , 295.69   ,
         295.19   ],
        [297.69   , 298.09   , 298.09   , ..., 296.49   , 296.19   ,
         295.69   ]]], dtype=float32)
# what is the type of the underlying data
type(ds.air.data)
numpy.ndarray

Review#

Xarray provides two main data structures:

  1. DataArrays that wrap underlying data containers (e.g. numpy arrays) and contain associated metadata

  2. DataSets that are dictionary-like containers of DataArrays


Why Xarray?#

Metadata provides context and provides code that is more legible. This reduces the likelihood of errors from typos and makes analysis more intuitive and fun!

Analysis without xarray: X(#

# plot the first timestep
lat = ds.air.lat.data  # numpy array
lon = ds.air.lon.data  # numpy array
temp = ds.air.data  # numpy array
plt.figure()
plt.pcolormesh(lon, lat, temp[0, :, :]);
../_images/xarray-in-45-min_25_0.png
temp.mean(axis=1)  ## what did I just do? I can't tell by looking at this line.
array([[279.39798, 279.6664 , 279.66122, ..., 279.9508 , 280.31522,
        280.6624 ],
       [279.05722, 279.538  , 279.7296 , ..., 279.77563, 280.27002,
        280.79764],
       [279.0104 , 279.2808 , 279.5508 , ..., 279.682  , 280.19763,
        280.81403],
       ...,
       [279.63   , 279.934  , 280.534  , ..., 279.802  , 280.346  ,
        280.77798],
       [279.398  , 279.66602, 280.31796, ..., 279.766  , 280.34198,
        280.834  ],
       [279.27   , 279.354  , 279.88202, ..., 279.42596, 279.96997,
        280.48196]], dtype=float32)

Analysis with xarray =)#

How readable is this code?

ds.air.isel(time=1).plot(x="lon");
../_images/xarray-in-45-min_28_0.png

Use dimension names instead of axis numbers

ds.air.mean("time")
<xarray.DataArray 'air' (lat: 25, lon: 53)>
array([[260.37564, 260.1826 , 259.88593, ..., 250.81511, 251.93733,
        253.43741],
       [262.7337 , 262.7936 , 262.7489 , ..., 249.75496, 251.5852 ,
        254.35849],
       [264.7681 , 264.3271 , 264.0614 , ..., 250.60707, 253.58247,
        257.71475],
       ...,
       [297.64932, 296.95294, 296.62912, ..., 296.81033, 296.28793,
        295.81622],
       [298.1287 , 297.93646, 297.47006, ..., 296.8591 , 296.77686,
        296.44348],
       [298.36594, 298.38593, 298.11386, ..., 297.33777, 297.28104,
        297.30502]], dtype=float32)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0

Extracting data or “indexing”#

Xarray supports

  • label-based indexing using .sel

  • position-based indexing using .isel

See the documentation for more: https://docs.xarray.dev/en/stable/indexing.html

Label-based indexing#

Xarray inherits its label-based indexing rules from pandas; this means great support for dates and times!

# pull out data for all of 2013-May
ds.sel(time="2013-05")
<xarray.Dataset>
Dimensions:  (lat: 25, time: 124, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-05-01 ... 2013-05-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 259.2 259.3 259.1 ... 298.2 297.6 297.5
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
# demonstrate slicing
ds.sel(time=slice("2013-05", "2013-07"))
<xarray.Dataset>
Dimensions:  (lat: 25, time: 368, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-05-01 ... 2013-07-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 259.2 259.3 259.1 ... 299.4 299.5 299.7
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
# demonstrate "nearest" indexing
ds.sel(lon=240.2, method="nearest")
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
    lon      float32 240.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat) float32 239.6 237.2 240.1 249.0 ... 294.8 296.9 298.4
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
# "nearest indexing at multiple points"
ds.sel(lon=[240.125, 234], lat=[40.3, 50.3], method="nearest")
<xarray.Dataset>
Dimensions:  (lat: 2, time: 2920, lon: 2)
Coordinates:
  * lat      (lat) float32 40.0 50.0
  * lon      (lon) float32 240.0 235.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 268.1 283.0 265.5 ... 285.2 256.8 268.6
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

Position-based indexing#

This is similar to your usual numpy array[0, 2, 3] but with the power of named dimensions!

# pull out time index 0 and lat index 0
ds.air.isel(time=0, lat=0)  #  much better than ds.air[0, 0, :]
<xarray.DataArray 'air' (lon: 53)>
array([241.2    , 242.5    , 243.5    , 244.     , 244.09999, 243.89   ,
       243.59999, 243.09999, 242.5    , 241.89   , 241.2    , 240.29999,
       239.5    , 238.79999, 238.5    , 238.7    , 239.59999, 241.     ,
       242.89   , 244.79999, 246.5    , 247.79999, 248.59999, 249.     ,
       249.09999, 249.09999, 249.     , 248.89   , 248.7    , 248.59999,
       248.39   , 248.29999, 248.29999, 248.59999, 249.     , 249.5    ,
       249.59999, 249.09999, 247.79999, 245.39   , 242.2    , 238.5    ,
       234.7    , 231.29999, 228.79999, 227.29999, 227.     , 227.5    ,
       228.79999, 230.59999, 232.79999, 235.5    , 238.59999],
      dtype=float32)
Coordinates:
    lat      float32 75.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
    time     datetime64[ns] 2013-01-01
Attributes:
    long_name:       4xDaily Air temperature at sigma level 995
    units:           degK
    precision:       2
    GRIB_id:         11
    GRIB_name:       TMP
    var_desc:        Air temperature
    dataset:         NMC Reanalysis
    level_desc:      Surface
    statistic:       Individual Obs
    parent_stat:     Other
    actual_range:    [185.16 322.1 ]
    who_is_awesome:  xarray
# demonstrate slicing
ds.air.isel(lat=slice(10))
<xarray.DataArray 'air' (time: 2920, lat: 10, lon: 53)>
array([[[241.2    , 242.5    , 243.5    , ..., 232.79999, 235.5    ,
         238.59999],
        [243.79999, 244.5    , 244.7    , ..., 232.79999, 235.29999,
         239.29999],
        [250.     , 249.79999, 248.89   , ..., 233.2    , 236.39   ,
         241.7    ],
        ...,
        [274.79   , 275.19998, 275.6    , ..., 277.19998, 277.     ,
         277.     ],
        [275.9    , 276.9    , 276.9    , ..., 280.9    , 280.5    ,
         279.69998],
        [276.69998, 277.4    , 277.69998, ..., 283.29   , 284.1    ,
         283.9    ]],

       [[242.09999, 242.7    , 243.09999, ..., 232.     , 233.59999,
         235.79999],
        [243.59999, 244.09999, 244.2    , ..., 231.     , 232.5    ,
         235.7    ],
        [253.2    , 252.89   , 252.09999, ..., 230.79999, 233.39   ,
         238.5    ],
...
        [275.59   , 276.29   , 277.49   , ..., 275.19   , 275.79   ,
         276.59   ],
        [276.88998, 277.88998, 278.69   , ..., 273.59   , 274.29   ,
         275.29   ],
        [276.79   , 277.29   , 278.29   , ..., 274.19   , 275.38998,
         276.88998]],

       [[245.09   , 244.29   , 243.29   , ..., 241.68999, 241.48999,
         241.79   ],
        [249.89   , 249.29   , 248.39   , ..., 239.59   , 240.29   ,
         241.68999],
        [262.99   , 262.19   , 261.38998, ..., 239.89   , 242.59   ,
         246.29   ],
        ...,
        [274.29   , 274.49   , 275.59   , ..., 274.69   , 274.99   ,
         275.38998],
        [276.79   , 277.49   , 277.99   , ..., 273.19   , 273.59   ,
         274.19   ],
        [276.88998, 277.29   , 277.59   , ..., 273.79   , 274.99   ,
         276.19   ]]], dtype=float32)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 62.5 60.0 57.5 55.0 52.5
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Attributes:
    long_name:       4xDaily Air temperature at sigma level 995
    units:           degK
    precision:       2
    GRIB_id:         11
    GRIB_name:       TMP
    var_desc:        Air temperature
    dataset:         NMC Reanalysis
    level_desc:      Surface
    statistic:       Individual Obs
    parent_stat:     Other
    actual_range:    [185.16 322.1 ]
    who_is_awesome:  xarray

Concepts for computation#

Consider calculating the mean air temperature per unit surface area for this dataset. Because latitude and longitude correspond to spherical coordinates for Earth’s surface, each 2.5x2.5 degree grid cell actually has a different surface area as you move away from the equator! This is because latitudinal length is fixed (\( \delta Lat = R \delta \phi \)), but longitudinal length varies with latitude (\( \delta Lon = R \delta \lambda \cos(\phi) \))

So the area element for lat-lon coordinates is

\[ \delta A = R^2 \delta \phi \delta \lambda \cos(\phi) \]

where \(\phi\) is latitude, \(\delta \phi\) is the spacing of the points in latitude, \(\delta \lambda\) is the spacing of the points in longitude, and \(R\) is Earth’s radius. (In this formula, \(\phi\) and \(\lambda\) are measured in radians)

# Earth's average radius
R = 6.371e6

# Coordinate spacing for this dataset is 2.5 x 2.5 degrees
 = np.deg2rad(2.5)
 = np.deg2rad(2.5)

dlat = R *  * xr.ones_like(ds.air.lon)
dlon = R *  * np.cos(np.deg2rad(ds.air.lat))
# cell latitude length is constant with longitude
dlat
<xarray.DataArray 'lon' (lon: 53)>
array([277987.3, 277987.3, 277987.3, 277987.3, 277987.3, 277987.3,
       277987.3, 277987.3, 277987.3, 277987.3, 277987.3, 277987.3,
       277987.3, 277987.3, 277987.3, 277987.3, 277987.3, 277987.3,
       277987.3, 277987.3, 277987.3, 277987.3, 277987.3, 277987.3,
       277987.3, 277987.3, 277987.3, 277987.3, 277987.3, 277987.3,
       277987.3, 277987.3, 277987.3, 277987.3, 277987.3, 277987.3,
       277987.3, 277987.3, 277987.3, 277987.3, 277987.3, 277987.3,
       277987.3, 277987.3, 277987.3, 277987.3, 277987.3, 277987.3,
       277987.3, 277987.3, 277987.3, 277987.3, 277987.3], dtype=float32)
Coordinates:
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X
# cell longitude length changes with latitude
dlon
<xarray.DataArray 'lat' (lat: 25)>
array([ 71948.42 ,  83592.41 ,  95077.266, 106381.15 , 117482.516,
       128360.25 , 138993.64 , 149362.47 , 159446.97 , 169227.95 ,
       178686.8  , 187805.5  , 196566.7  , 204953.75 , 212950.64 ,
       220542.16 , 227713.88 , 234452.11 , 240744.06 , 246577.75 ,
       251942.06 , 256826.78 , 261222.62 , 265121.22 , 268515.12 ],
      dtype=float32)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
Attributes:
    standard_name:  latitude
    long_name:      Latitude
    units:          degrees_north
    axis:           Y

Broadcasting: expanding data#

Our longitude and latitude length DataArrays are both 1D with different dimension names. If we multiple these DataArrays together the dimensionality is expanded to 2D via broadcasting:

cell_area = dlon * dlat
cell_area
<xarray.DataArray (lat: 25, lon: 53)>
array([[2.0000748e+10, 2.0000748e+10, 2.0000748e+10, ..., 2.0000748e+10,
        2.0000748e+10, 2.0000748e+10],
       [2.3237628e+10, 2.3237628e+10, 2.3237628e+10, ..., 2.3237628e+10,
        2.3237628e+10, 2.3237628e+10],
       [2.6430274e+10, 2.6430274e+10, 2.6430274e+10, ..., 2.6430274e+10,
        2.6430274e+10, 2.6430274e+10],
       ...,
       [7.2616575e+10, 7.2616575e+10, 7.2616575e+10, ..., 7.2616575e+10,
        7.2616575e+10, 7.2616575e+10],
       [7.3700336e+10, 7.3700336e+10, 7.3700336e+10, ..., 7.3700336e+10,
        7.3700336e+10, 7.3700336e+10],
       [7.4643800e+10, 7.4643800e+10, 7.4643800e+10, ..., 7.4643800e+10,
        7.4643800e+10, 7.4643800e+10]], dtype=float32)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0

The result has two dimensions because xarray realizes that dimensions lon and lat are different so it automatically “broadcasts” to get a 2D result. See the last row in this image from Jake VanderPlas Python Data Science Handbook

https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png

Because xarray knows about dimension names we avoid having to create unnecessary size-1 dimensions using np.newaxis or .reshape. For more, see https://docs.xarray.dev/en/stable/user-guide/computation.html#broadcasting-by-dimension-name


Alignment: putting data on the same grid#

When doing arithmetic operations xarray automatically “aligns” i.e. puts the data on the same grid. In this case cell_area and ds.air are at the same lat, lon points we end up with a result with the same shape (25x53):

ds.air.isel(time=1) / cell_area
<xarray.DataArray (lat: 25, lon: 53)>
array([[1.2104547e-08, 1.2134547e-08, 1.2154545e-08, ..., 1.1599567e-08,
        1.1679563e-08, 1.1789559e-08],
       [1.0482998e-08, 1.0504514e-08, 1.0508818e-08, ..., 9.9407735e-09,
        1.0005325e-08, 1.0143031e-08],
       [9.5799235e-09, 9.5681942e-09, 9.5383044e-09, ..., 8.7324104e-09,
        8.8304040e-09, 9.0237435e-09],
       ...,
       [4.0817127e-09, 4.0748271e-09, 4.0789585e-09, ..., 4.0679415e-09,
        4.0638106e-09, 4.0595416e-09],
       [4.0189776e-09, 4.0257615e-09, 4.0269832e-09, ..., 4.0108366e-09,
        4.0094799e-09, 4.0040522e-09],
       [3.9693853e-09, 3.9815764e-09, 3.9842556e-09, ..., 3.9708588e-09,
        3.9708588e-09, 3.9735384e-09]], dtype=float32)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
    time     datetime64[ns] 2013-01-01T06:00:00

Now lets make cell_area unaligned i.e. change the coordinate labels

# make a copy of cell_area
# then add 1e-5 degrees to latitude
cell_area_bad = cell_area.copy(deep=True)
cell_area_bad["lat"] = cell_area.lat + 1e-5  # latitudes are off by 1e-5 degrees!
cell_area_bad
<xarray.DataArray (lat: 25, lon: 53)>
array([[2.0000748e+10, 2.0000748e+10, 2.0000748e+10, ..., 2.0000748e+10,
        2.0000748e+10, 2.0000748e+10],
       [2.3237628e+10, 2.3237628e+10, 2.3237628e+10, ..., 2.3237628e+10,
        2.3237628e+10, 2.3237628e+10],
       [2.6430274e+10, 2.6430274e+10, 2.6430274e+10, ..., 2.6430274e+10,
        2.6430274e+10, 2.6430274e+10],
       ...,
       [7.2616575e+10, 7.2616575e+10, 7.2616575e+10, ..., 7.2616575e+10,
        7.2616575e+10, 7.2616575e+10],
       [7.3700336e+10, 7.3700336e+10, 7.3700336e+10, ..., 7.3700336e+10,
        7.3700336e+10, 7.3700336e+10],
       [7.4643800e+10, 7.4643800e+10, 7.4643800e+10, ..., 7.4643800e+10,
        7.4643800e+10, 7.4643800e+10]], dtype=float32)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
cell_area_bad * ds.air.isel(time=1)
<xarray.DataArray (lat: 0, lon: 53)>
array([], shape=(0, 53), dtype=float32)
Coordinates:
  * lat      (lat) float64 
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
    time     datetime64[ns] 2013-01-01T06:00:00

The result is an empty array with no latitude coordinates because none of them were aligned!

Tip: If you notice extra NaNs or missing points after xarray computation, it means that your xarray coordinates were not aligned exactly.

For more, see https://docs.xarray.dev/en/stable/user-guide/computation.html#automatic-alignment


High level computation#

(groupby, resample, rolling, coarsen, weighted)

Xarray has some very useful high level objects that let you do common computations:

  1. groupby : Bin data in to groups and reduce

  2. resample : Groupby specialized for time axes. Either downsample or upsample your data.

  3. rolling : Operate on rolling windows of your data e.g. running mean

  4. coarsen : Downsample your data

  5. weighted : Weight your data before reducing

groupby#

# seasonal groups
ds.groupby("time.season")
DatasetGroupBy, grouped over 'season'
4 groups with labels 'DJF', 'JJA', 'MAM', 'SON'.
# make a seasonal mean
seasonal_mean = ds.groupby("time.season").mean()
seasonal_mean
<xarray.Dataset>
Dimensions:  (lat: 25, season: 4, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * season   (season) object 'DJF' 'JJA' 'MAM' 'SON'
Data variables:
    air      (season, lat, lon) float32 247.0 247.0 246.7 ... 299.4 299.4 299.5

The seasons are out of order (they are alphabetically sorted). This is a common annoyance. The solution is to use .reindex

seasonal_mean = seasonal_mean.reindex(season=["DJF", "MAM", "JJA", "SON"])
seasonal_mean
<xarray.Dataset>
Dimensions:  (season: 4, lat: 25, lon: 53)
Coordinates:
  * season   (season) <U3 'DJF' 'MAM' 'JJA' 'SON'
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Data variables:
    air      (season, lat, lon) float32 247.0 247.0 246.7 ... 299.4 299.4 299.5

resample#

# resample to monthly frequency
ds.resample(time="M").mean()
<xarray.Dataset>
Dimensions:  (time: 24, lat: 25, lon: 53)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-31 2013-02-28 ... 2014-12-31
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Data variables:
    air      (time, lat, lon) float32 244.5 244.7 244.7 ... 297.7 297.7 297.7

weighted#

# weight by cell_area and take mean over (time, lon)
ds.weighted(cell_area).mean(["lon", "time"]).air.plot();
../_images/xarray-in-45-min_62_0.png

Visualization#

(.plot)

For more see https://docs.xarray.dev/en/stable/plotting.html and https://docs.xarray.dev/en/stable/examples/visualization_gallery.html

We have seen very simple plots earlier. Xarray has some support for visualizing 3D and 4D datasets by presenting multiple facets (or panels or subplots) showing variations across rows and/or columns.

# facet the seasonal_mean
seasonal_mean.air.plot(col="season");
../_images/xarray-in-45-min_64_0.png
# contours
seasonal_mean.air.plot.contour(col="season", levels=20, add_colorbar=True);
../_images/xarray-in-45-min_65_0.png
# line plots too? wut
seasonal_mean.air.mean("lon").plot.line(hue="season", y="lat");
../_images/xarray-in-45-min_66_0.png
ds
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

Reading and writing files#

Xarray supports many disk formats. Below is a small example using netCDF. For more see https://docs.xarray.dev/en/stable/user-guide/io.html

# write to netCDF
ds.to_netcdf("my-example-dataset.nc")
/tmp/ipykernel_3388/1659397704.py:2: SerializationWarning: saving variable air with floating point data as an integer dtype without any _FillValue to use for NaNs
  ds.to_netcdf("my-example-dataset.nc")

Note

To avoid the SerializationWarning you can assign a _FillValue for any NaNs in ‘air’ array by adding the keyword argument encoding=dict(air={_FillValue=-9999})

# read from disk
fromdisk = xr.open_dataset("my-example-dataset.nc")
fromdisk
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
# check that the two are identical
ds.identical(fromdisk)
True

Tip: A common use case to read datasets that are a collection of many netCDF files. See https://docs.xarray.dev/en/stable/user-guide/io.html#reading-multi-file-datasets for how to handle that


The scientific python ecosystem#

Xarray ties in to the larger scientific python ecosystem and in turn many packages build on top of xarray. A long list of such packages is here: https://docs.xarray.dev/en/stable/related-projects.html.

Now we will demonstrate some cool features.

Pandas: tabular data structures#

You can easily convert between xarray and pandas structures: https://pandas.pydata.org/

This allows you to conveniently use the extensive pandas ecosystem of packages (like seaborn) for your work.

See https://docs.xarray.dev/en/stable/pandas.html

# convert to pandas dataframe
df = ds.isel(time=slice(10)).to_dataframe()
df
air
lat time lon
75.0 2013-01-01 00:00:00 200.0 241.199997
202.5 242.500000
205.0 243.500000
207.5 244.000000
210.0 244.099991
... ... ... ...
15.0 2013-01-03 06:00:00 320.0 297.000000
322.5 297.290009
325.0 296.899994
327.5 296.790009
330.0 297.100006

13250 rows × 1 columns

# convert dataframe to xarray
df.to_xarray()
<xarray.Dataset>
Dimensions:  (lat: 25, time: 10, lon: 53)
Coordinates:
  * lat      (lat) float64 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2013-01-03T06:00:00
  * lon      (lon) float64 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Data variables:
    air      (lat, time, lon) float32 241.2 242.5 243.5 ... 296.9 296.8 297.1

Numpy alternatives#

Xarray can wrap other array types! For example:

https://docs.dask.org/en/latest/_static/images/dask-horizontal-white.svg

dask : parallel arrays https://docs.xarray.dev/en/stable/user-guide/dask.html & https://docs.dask.org/en/latest/array.html

https://sparse.pydata.org/en/stable/_images/logo.png

pydata/sparse : sparse arrays https://sparse.pydata.org

https://raw.githubusercontent.com/cupy/cupy.dev/master/images/cupy_logo.png

cupy : GPU arrays https://cupy.dev

https://pint.readthedocs.io/en/stable/_images/logo-full.jpg

pint : unit-aware computations https://pint.readthedocs.io & https://github.com/xarray-contrib/pint-xarray

Dask#

Dask cuts up NumPy arrays into blocks and parallelizes your analysis code across these blocks

https://raw.githubusercontent.com/dask/dask/main/docs/source/images/dask-array.svg
# demonstrate dask dataset
dasky = xr.tutorial.open_dataset(
    "air_temperature",
    chunks={"time": 10},  # 10 time steps in each block
)

dasky.air
<xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)>
dask.array<open_dataset-7d0bf76ab064774e1166f6b90c9d9765air, shape=(2920, 25, 53), dtype=float32, chunksize=(10, 25, 53), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Attributes:
    long_name:     4xDaily Air temperature at sigma level 995
    units:         degK
    precision:     2
    GRIB_id:       11
    GRIB_name:     TMP
    var_desc:      Air temperature
    dataset:       NMC Reanalysis
    level_desc:    Surface
    statistic:     Individual Obs
    parent_stat:   Other
    actual_range:  [185.16 322.1 ]

All computations with dask-backed xarray objects are lazy, allowing you to build up a complicated chain of analysis steps quickly

# demonstrate lazy mean
dasky.air.mean("lat")
<xarray.DataArray 'air' (time: 2920, lon: 53)>
dask.array<mean_agg-aggregate, shape=(2920, 53), dtype=float32, chunksize=(10, 53), chunktype=numpy.ndarray>
Coordinates:
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00

To get concrete values, call .compute or .load

# "compute" the mean
dasky.air.mean("lat").compute()
<xarray.DataArray 'air' (time: 2920, lon: 53)>
array([[279.39798, 279.6664 , 279.66122, ..., 279.9508 , 280.31522,
        280.6624 ],
       [279.05722, 279.538  , 279.7296 , ..., 279.77563, 280.27002,
        280.79764],
       [279.0104 , 279.2808 , 279.5508 , ..., 279.682  , 280.19763,
        280.81403],
       ...,
       [279.63   , 279.934  , 280.534  , ..., 279.802  , 280.346  ,
        280.77798],
       [279.398  , 279.66602, 280.31796, ..., 279.766  , 280.34198,
        280.834  ],
       [279.27   , 279.354  , 279.88202, ..., 279.42596, 279.96997,
        280.48196]], dtype=float32)
Coordinates:
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00

HoloViz#

Quickly generate interactive plots from your data!

The hvplot package attaches itself to all xarray objects under the .hvplot namespace. So instead of using .plot use .hvplot

import hvplot.xarray

ds.air.hvplot(groupby="time", clim=(270, 300), widget_location='bottom')

Note

The time slider will only work if you’re executing the notebook, rather than viewing the website

cf_xarray#

cf_xarray is a project that tries to let you make use of other CF attributes that xarray ignores. It attaches itself to all xarray objects under the .cf namespace.

Where xarray allows you to specify dimension names for analysis, cf_xarray lets you specify logical names like "latitude" or "longitude" instead as long as the appropriate CF attributes are set.

For example, the "longitude" dimension in different files might be labelled as: (lon, LON, long, x…), but cf_xarray let’s you always refer to the logical name "longitude" in your code:

import cf_xarray
# describe cf attributes in dataset
ds.air.cf.describe()
Coordinates:
- CF Axes: * X: ['lon']
           * Y: ['lat']
           * T: ['time']
             Z: n/a

- CF Coordinates: * longitude: ['lon']
                  * latitude: ['lat']
                  * time: ['time']
                    vertical: n/a

- Cell Measures:   area, volume: n/a

- Standard Names: * latitude: ['lat']
                  * longitude: ['lon']
                  * time: ['time']

- Bounds:   n/a

The following mean operation will work with any dataset that has appropriate attributes set that allow detection of the “latitude” variable (e.g. units: "degress_north" or standard_name: "latitude")

# demonstrate equivalent of .mean("lat")
ds.air.cf.mean("latitude")
<xarray.DataArray 'air' (time: 2920, lon: 53)>
array([[279.39798, 279.6664 , 279.66122, ..., 279.9508 , 280.31522,
        280.6624 ],
       [279.05722, 279.538  , 279.7296 , ..., 279.77563, 280.27002,
        280.79764],
       [279.0104 , 279.2808 , 279.5508 , ..., 279.682  , 280.19763,
        280.81403],
       ...,
       [279.63   , 279.934  , 280.534  , ..., 279.802  , 280.346  ,
        280.77798],
       [279.398  , 279.66602, 280.31796, ..., 279.766  , 280.34198,
        280.834  ],
       [279.27   , 279.354  , 279.88202, ..., 279.42596, 279.96997,
        280.48196]], dtype=float32)
Coordinates:
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
# demonstrate indexing
ds.air.cf.sel(longitude=242.5, method="nearest")
<xarray.DataArray 'air' (time: 2920, lat: 25)>
array([[241.     , 238.     , 239.7    , ..., 292.     , 293.9    ,
        296.79   ],
       [240.     , 238.39   , 241.09999, ..., 292.6    , 294.1    ,
        296.69998],
       [240.7    , 238.89   , 240.79999, ..., 292.29   , 293.4    ,
        296.1    ],
       ...,
       [241.79   , 243.48999, 246.48999, ..., 294.69   , 296.69   ,
        298.49   ],
       [239.89   , 241.68999, 242.29   , ..., 295.09   , 296.88998,
        298.59   ],
       [239.59   , 241.48999, 240.79   , ..., 295.19   , 296.79   ,
        298.88998]], dtype=float32)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
    lon      float32 242.5
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Attributes:
    long_name:       4xDaily Air temperature at sigma level 995
    units:           degK
    precision:       2
    GRIB_id:         11
    GRIB_name:       TMP
    var_desc:        Air temperature
    dataset:         NMC Reanalysis
    level_desc:      Surface
    statistic:       Individual Obs
    parent_stat:     Other
    actual_range:    [185.16 322.1 ]
    who_is_awesome:  xarray

Other cool packages#

  • xgcm : grid-aware operations with xarray objects

  • xrft : fourier transforms with xarray

  • xclim : calculating climate indices with xarray objects

  • intake-xarray : forget about file paths

  • rioxarray : raster files and xarray

  • xesmf : regrid using ESMF

  • MetPy : tools for working with weather data

Check Xarray documentation for even more! https://docs.xarray.dev/en/stable/related-projects.html

More information#

  1. A description of common terms used in the xarray documentation: https://docs.xarray.dev/en/stable/terminology.html

  2. For information on how to create a DataArray from an existing numpy array: https://docs.xarray.dev/en/stable/user-guide/data-structures.html#creating-a-dataarray

  3. Answers to common questions on “how to do X” are here: https://docs.xarray.dev/en/stable/howdoi.html

  4. Ryan Abernathey has a book on data analysis with a chapter on Xarray: https://earth-env-data-science.github.io/lectures/xarray/xarray_intro.html